Mandated by the Brady Handgun Violence Prevention Act of 1993 and launched by the FBI on November 30, 1998, the National Instant Criminal Background Check System (NICS) is used by Federal Firearms Licensees to instantly determine whether a prospective buyer is eligible to buy firearms. Before ringing up the sale, cashiers call in a check to the FBI or to other designated agencies to ensure that each customer does not have a criminal record or isn’t otherwise ineligible to make a purchase.
The FBI provides data on the number of firearm checks by month, state, and type. This study examines the history of data from November 1998 through January 2018.
These statistics represent the number of firearm background checks initiated through the NICS.
Based on varying state laws and purchase scenarios, a one-to-one correlation cannot be made between a firearm background check and a firearm sale.
Many checks are for concealed carry permits, not actual gun sales. The FBI’s numbers do not include private gun sales, many of which do not require a background check. A study conducted by Harvard researchers found that roughly 40 percent of respondents had acquired their most recent firearm without going through a background check.
This study uses a 1:1 ratio to proxy background checks to gun sales.
Sales estimates are calculated from handgun, long gun and total number of background checks.
This study aims to answer the following questions:
In addition to the FBI's NICS database, this study use census data from the US Census Bureau. This is done in order to standardize the data on a per capita basis. Thus an unbiased comparison can be done across states with different populations.
It is important to note that the population changes over time. When standardizing the data on a per capita basis, the population at that time must be used to accurately compute the per capita.
# Environment: Python 3.7, NumPy 1.16, Pandas 0.24, MatPlotLib 3.0
import numpy as np
import pandas as pd
import seaborn as sns
# Read in the NICS data
filename = '/Users/jeffkalmanson/Documents/Udacity_Data_Analyst/P2_Ver_2_Data_Analysis/nics-firearm-background-checks.csv'
nics_df = pd.read_csv(filename)
nics_df.head()
<-- Scroll left to view full table -->
As the data files are loaded, display the data dictionary of the dataset. This lists all the variables available for the analysis, the data type for the data imported, and the quantity.
# nics_df.info()
| Column Name | Count | Data Type |
|---|---|---|
| month | 12705 | non-null string |
| state | 12705 | non-null string |
| permit | 12681 | non-null float64 |
| permit_recheck | 1320 | non-null float64 |
| handgun | 12685 | non-null float64 |
| long_gun | 12686 | non-null float64 |
| other | 5720 | non-null float64 |
| multiple | 12705 | non-null int64 |
| admin | 12682 | non-null float64 |
| prepawn_handgun | 10762 | non-null float64 |
| prepawn_long_gun | 10760 | non-null float64 |
| prepawn_other | 5335 | non-null float64 |
| redemption_handgun | 10765 | non-null float64 |
| redemption_long_gun | 10764 | non-null float64 |
| redemption_other | 5335 | non-null float64 |
| returned_handgun | 2420 | non-null float64 |
| returned_long_gun | 2365 | non-null float64 |
| returned_other | 2035 | non-null float64 |
| rentals_handgun | 1210 | non-null float64 |
| rentals_long_gun | 1045 | non-null float64 |
| private_sale_handgun | 2970 | non-null float64 |
| private_sale_long_gun | 2970 | non-null float64 |
| private_sale_other | 2970 | non-null float64 |
| return_to_seller_handgun | 2695 | non-null float64 |
| return_to_seller_long_gun | 2970 | non-null float64 |
| return_to_seller_other | 2475 | non-null float64 |
| totals | 12705 | non-null int64 |
Data columns: 27 columns
Range Index: 12705 entries, 0 to 12704
During the data import, certain data manipulation can occur, such as removing the numerical thousands separator.
# Read in the Census data 2010 - 2017
filename = '/Users/jeffkalmanson/Documents/Udacity_Data_Analyst/P2_Ver_2_Data_Analysis/Census_population_state_10_17.csv'
census_2010_2017_df = pd.read_csv(filename, thousands = ',') # Remove thousands separator
census_2010_2017_df.head()
# census_2010_2017_df.info()
| Column Name | Count | Data Type |
|---|---|---|
| Geography | 52 | non-null string |
| April 1, 2010 - Census | 52 | non-null int64 |
| April 1, 2010 - Estimates Base | 52 | non-null int64 |
| Population Estimate - 2010 | 52 | non-null int64 |
| Population Estimate - 2011 | 52 | non-null int64 |
| Population Estimate - 2012 | 52 | non-null int64 |
| Population Estimate - 2013 | 52 | non-null int64 |
| Population Estimate - 2014 | 52 | non-null int64 |
| Population Estimate - 2015 | 52 | non-null int64 |
| Population Estimate - 2016 | 52 | non-null int64 |
| Population Estimate - 2017 | 52 | non-null int64 |
Data columns: 11 columns
Range Index: 52 entries, 0 to 51
# Read in the Census data 2000 - 2010
filename = '/Users/jeffkalmanson/Documents/Udacity_Data_Analyst/P2_Ver_2_Data_Analysis/Intercensal_pop_est_2000-2010.csv'
census_2000_2010_df = pd.read_csv(filename, thousands = ',') # Remove thousands separator
census_2000_2010_df.head()
<-- Scroll left to view full table -->
# census_2000_2010_df.info()
| Column Name | Count | Data Type |
|---|---|---|
| Geographic Area | 56 | non-null string |
| April 1, 2000 | 56 | non-null int64 |
| Intercensal-2000 | 56 | non-null int64 |
| Intercensal-2001 | 56 | non-null int64 |
| Intercensal-2002 | 56 | non-null int64 |
| Intercensal-2003 | 56 | non-null int64 |
| Intercensal-2004 | 56 | non-null int64 |
| Intercensal-2005 | 56 | non-null int64 |
| Intercensal-2006 | 56 | non-null int64 |
| Intercensal-2007 | 56 | non-null int64 |
| Intercensal-2008 | 56 | non-null int64 |
| Intercensal-2009 | 56 | non-null int64 |
| April 1, 2010 | 56 | non-null int64 |
| July 1, 2010 | 56 | non-null int64 |
Data columns: 14 columns
Range Index: 56 entries, 0 to 55
# Read in the Census data 1998 - 1999
filename = '/Users/jeffkalmanson/Documents/Udacity_Data_Analyst/P2_Ver_2_Data_Analysis/State_pop_1998-1999.txt'
column_names = ['Area Name', 'Pop Est July 1 1999', 'Pop Est July 1 1998', 'Numeric Change', 'Percent Change',
'Births', 'Deaths', 'Net Int\'l Migration', 'Net Domestic Migration']
census_1998_1999_df = pd.read_table(filename, delimiter='\s+', names=column_names, skiprows=3)
# Delimiter is 2 spaces+
census_1998_1999_df.head()
# census_1998_1999_df.info()
| Column Name | Count | Data Type |
|---|---|---|
| Area Name | 65 | non-null string |
| Pop Est July 1 1999 | 65 | non-null int64 |
| Pop Est July 1 1998 | 65 | non-null int64 |
| Numeric Change | 65 | non-null int64 |
| Percent Change | 65 | non-null float64 |
| Births | 65 | non-null int64 |
| Deaths | 65 | non-null int64 |
| Net Int'l Migration | 65 | non-null int64 |
| Net Domestic Migration | 65 | non-null int64 |
Data columns: 9 columns
Range Index: 65 entries, 0 to 64
After importing, the data is cleaned to be usable in the analysis.
In the Census data 2000 - 2010, the states have a period before the state name, for example, ".California"
In the Census data 1998 - 1999, the geographic names have an underscore instead of a space, for example, "New_York"
# Remove the period from the state name in the Census data 2000 - 2010
# Select the column named Geographic Area using all rows
states_period = census_2000_2010_df.loc[ : , 'Geographic Area']
# Scan thru the values and remove the period
# Test and check results
fixit = states_period.str.replace('.', '') # replace() operates on a Series
print (fixit.head(10))
print ()
# Looks good, now fix it in the data frame
census_2000_2010_df['Geographic Area'] = census_2000_2010_df.loc[ : , 'Geographic Area'].str.replace('.', '')
census_2000_2010_df.head(10)
<-- Scroll left to view full table -->
# Remove the underscore from the state name in the Census data 1998 - 1999 and replace it with a space
# Select the column named Area Name using all rows
states_underscore = census_1998_1999_df.loc[ : , 'Area Name']
# Scan thru the values, remove the underscore, and insert a space
# Test and check results
fixit = states_underscore.str.replace('_', ' ') # replace() operates on a Series
print (fixit.head(10))
print ()
# Looks good, now fix it in the data frame
census_1998_1999_df['Area Name'] = census_1998_1999_df.loc[ : , 'Area Name'].str.replace('_', ' ')
census_1998_1999_df.head(10)
Check the data types of the imported data and convert to the correct type if necessary. For example if a number is imported as a string, then convert it to an integer or float.
nics_df.dtypes
In the NICS dataset, the month field imported as a string. Convert month to a date.
import datetime
nics_df['month'] = pd.to_datetime(nics_df['month'], yearfirst=True, format='%Y-%m')
nics_df.head(10)
<-- Scroll left to view full table -->
nics_df.dtypes # Check the conversion
nics_df['month'].values # Take a peek at the values
Check the data types for the Census data.
census_2010_2017_df.dtypes
census_2000_2010_df.dtypes
census_1998_1999_df.dtypes
The Census data were correctly imported as numbers.
At this point, the data is loaded and cleaned. Move and arrange the data into the desired data structures to be used in the computation.
Merge the census data into one dataset spanning all dates (1998 to 2017).
# Merge the data from 2000 - 2017
census_all_2000_df = census_2000_2010_df.merge(census_2010_2017_df, left_on='Geographic Area',
right_on='Geography', how='inner')
pd.set_option('display.max_columns', 30)
census_all_2000_df.head(10)
<-- Scroll left to view full table -->
# Merge in the data from 1998 - 1999 for all data 1998 - 2017
census_all_df = census_1998_1999_df.merge(census_all_2000_df, left_on='Area Name' , right_on='Geographic Area',
how='inner')
census_all_df.head(10)
<-- Scroll left to view full table -->
In the unified census data structure, remove the data not used in the analysis. This minimizes the computing structures to reduce memory usage and less clutter helps to avoid errors.
# Remove unneeded columns from the merged Census data frame
remove_cols = ['April 1, 2010 - Census', 'April 1, 2010 - Estimates Base', 'April 1, 2010', 'July 1, 2010',
'April 1, 2000', 'Geography', 'Numeric Change', 'Percent Change', 'Births', 'Deaths',
'Net Int\'l Migration', 'Net Domestic Migration', 'Geographic Area']
census_all_df.drop(labels=remove_cols, axis='columns', inplace=True, errors='raise')
census_all_df.head(10)
<-- Scroll left to view full table -->
census_all_df.dtypes # Check the result
With only the essential data in the set, rename the data columns to avoid confusion and errors.
# Rename the columns in the merged Census data frame
census_all_df.rename(inplace=True, index=str, columns={"Pop Est July 1 1999": "1999", "Pop Est July 1 1998": "1998",
"Intercensal-2000": "2000", "Intercensal-2001": "2001",
"Intercensal-2002": "2002", "Intercensal-2003": "2003",
"Intercensal-2004": "2004", "Intercensal-2005": "2005",
"Intercensal-2006": "2006", "Intercensal-2007": "2007",
"Intercensal-2008": "2008", "Intercensal-2009": "2009",
"Population Estimate - 2010": "2010", "Population Estimate - 2011": "2011",
"Population Estimate - 2012": "2012", "Population Estimate - 2013": "2013",
"Population Estimate - 2014": "2014", "Population Estimate - 2015": "2015",
"Population Estimate - 2016": "2016", "Population Estimate - 2017": "2017" })
census_all_df.head(10)
<-- Scroll left to view full table -->
Rearrange certain data columns to structure the data frame in a logical order.
# Swap the 1999 and 1998 columns in the merged Census data frame
census_all_df = census_all_df.reindex(columns=['Area Name','1998','1999','2000','2001','2002','2003','2004','2005',
'2006','2007','2008','2009','2010','2011','2012','2013','2014',
'2015','2016','2017'] )
census_all_df.head(10)
<-- Scroll left to view full table -->
census_all_df.dtypes # Run a check
At this point, the Census population dataset is ready for the analysis.
Attention is turned to the NICS data.
Remove unused data from the NICS dataset.
# NICS cleanup -- remove unused data
df = pd.DataFrame
# df = nics_df[nics_df['state'] == 'Guam'] # Selects all the TRUE rows --> returns a DataFrame
remove_set = ['Guam', 'Mariana Islands', 'Puerto Rico', 'Virgin Islands']
df = nics_df[nics_df['state'].isin(remove_set)] # Selects all the TRUE rows --> returns a DataFrame
nics_df.drop(df.index, inplace=True)
nics_df.head(10)
<-- Scroll left to view full table -->
Run a count for the cleaned up NICS data frame to check the data set.
nics_df.count()
Both datasets are all set for the computation.
Visualize the method of analysis employed and graphically explain how the analysis will be performed. It is vitally important to have a clear picture of the solution before any computations are run.


For the background check dataset, group the data by year and state. This is done to select the data to be divided by the population for that year and state to compute the per capita values. This analysis is careful to accurately compute the per capita data using the population at the given time. For a particular state and the twelve months of that year, the census population for that state and year will be the divisor.
# NICS data: group by year, then by state
group_by_year_state_df = nics_df.groupby([(nics_df['month']).dt.year, 'state'], as_index=False) # Do not use keys as index
list(group_by_year_state_df)[208:209] # Use list() to display the groups
The year, state groups for the firearm check data are created. In the census population dataset, set the state as the data index. This allows the population number to be referenced by the state index and year column.
# In the Census data frame, set column 'Area Name' as the Index
census_all_df.set_index('Area Name', inplace=True)
print (census_all_df.loc['California', '1998'])
census_all_df.head(10)
<-- Scroll left to view full table -->
Divide the census population numbers by 100,000 to scale the magnitude of the values down. This will maintain the NICS per capita numbers in a reasonable computation range.
# Per capita 100K
# Divide census population by 100,000 to scale numbers
census_scale_df = census_all_df.div(100000.0, axis='columns') # Period in number converts to floating point
census_scale_df.head(10)
<-- Scroll left to view full table -->
Run a check on the grouped NICS data before the per capita calculation.
# Check the counts for the Group By data frame
# Should be 12 for months in a year
group_by_year_state_df.count().iloc[202:210, [0,1,2,3,4,5,6,25]]
In order to impartially compare the firearm checks across the fifty states on a relative basis, standardize the NICS numbers by dividing them by the scaled census population pertaining to that state for that year.
This provides the number of firearm checks per capita per 100K.
Population changes through time. In general, the population is expected to increase over time, but sometimes it decreases. To accurately standardize the firearm checks, this study uses the annual state populations as provided by the census data.
list(group_by_year_state_df)[208:209]
# Create a new DataFrame for the divided NICS data (standardized)
def standardize(f_group_by_year_state_df):
scaled_nics_df = pd.DataFrame() # Auto range should be 0 to 11 for months
f_std_nics_df = pd.DataFrame() # Standardized NICS DataFrame
for year_state, group in f_group_by_year_state_df:
yr = str(year_state[0])
st = year_state[1]
if yr != '2018':
cen = census_scale_df.loc[st, yr]
else:
cen = census_scale_df.loc[st, '2017'] # NICS data extends to Jan 2018. Use census 2017 data for Jan 2018
group.reset_index(drop=True, inplace=True) # Reset the groupby index to auto range by default
# Index should be 0 to 11 for 12 months of the year
scaled_nics_df = pd.DataFrame(group, columns=['month', 'state', 'handgun', 'long_gun', 'totals'])
# Divide by scaled census to standardize
scaled_nics_df[['handgun', 'long_gun', 'totals']] = scaled_nics_df[['handgun', 'long_gun', 'totals']].apply(lambda x: x / cen)
# Could use df.divide(cen) function instead of df.apply()
f_std_nics_df = f_std_nics_df.append(scaled_nics_df) # Build the standardized NICS DataFrame by appending
f_std_nics_df.rename(columns = {'month':'Month', 'state':'State', 'handgun':'Handgun', 'long_gun':'Long Gun',
'totals':'Total'}, inplace=True)
return f_std_nics_df
std_nics_df = standardize(group_by_year_state_df)
std_nics_df[1984:2002]
The dataset with the standardized NICS numbers is created. Review the summary statistics.
std_nics_df.describe()
The standard deviations are high! For example, in the 'Total' firearm checks, the mean is 472 checks per capita 100K, but the standard deviation is 608.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
%matplotlib inline
def graphit(stdz_nics_df, ky_label, y_end, extra):
fig, ax = plt.subplots(figsize=(22, 30), dpi = 300) # width, height in inches
# Remove the plot frame lines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.format_xdata = mdates.DateFormatter('%Y')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
for key, group in stdz_nics_df.groupby(['State']): # Group by State; plot time series of Total
ax = group.plot(ax=ax, kind='line', x='Month', y='Total', label=key) # plot_date matplotlib.dates.date2num(d)
ax.legend(loc='best', fancybox=True, framealpha=0.2)
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
# format the ticks
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
# set the range
ax.set_xlim([datetime.date(1998,10,1), datetime.date(2018,1,1)])
ax.set_ylim([0, y_end])
ax.tick_params(axis='x',which='both',bottom=True, direction='out', labelbottom=True,
labelsize=12.0, length=8.0, width=2.0, labelrotation=90.0)
ax.tick_params(axis='y',which='both',left=True, direction='out', labelleft=True,
labelsize=12.0, length=8.0, width=2.0)
if ky_label:
plt.text(datetime.date(2015,10,1), 8400, 'Kentucky') # Insert a label
y_step = 500.0
if y_end < 1000:
y_step = 100.0
plt.yticks(np.arange(0, y_end+1.0, y_step))
plt.xlabel('Year', fontsize=12)
plt.ylabel(''.join(['Total Firearm Checks per Capita 100K', extra]), fontsize=12)
plt.title(''.join(['Monthly New Gun Sales Using Firearm Checks as a Proxy', extra]), fontsize=18) # Standardized NICS data
return plt.show()
ky_label = True
y_end = 9500
extra =''
graphit(std_nics_df, ky_label, y_end, extra)
There are several spikes in the data. The most prominent is the State of Kentucky (black line).
The data demonstrates a distinct cyclical nature.
Following, these observations are discussed in further detail.
The graph shows a sudden, huge increase in firearm checks in July 2006 for the state of Kentucky. This is suspicious!
Research into the question revealed the following: In July 2006, Kentucky implemented a new state law that required automatic monthly background checks on every holder of concealed carry permits in the state. Kentucky is the only state with such a policy.
Regarding the data, the automatic monthly background checks for existing concealed carry permit holders are included in the Total monthly firearm checks. To correct this, subtract out the Permit number from the Total for Kentucky from July 2006 onward.
The computation must select only the State of Kentucky and only the months after July 2006. For the selected data, perform the subtraction, leaving all other data as is. Visualize the methodology.

# Filter out the data for Kentucky from July 2006 onwards
nics_df.query('state == "Kentucky" & month >= "2006-07-01"').head(10)
<-- Scroll left to view full table -->
# Correct the State of Kentucky NICS data
# Subtract 'permit' from 'totals' from July 2006 onward
# nics_df['state'] == 'Kentucky'
original_totals = nics_df['totals'].copy()
nics_ky_df = nics_df.copy()
temp = nics_ky_df.query('state == "Kentucky" & month >= "2006-07-01"') # Filter out the data
nics_ky_df['totals'] = temp['totals'].subtract(temp['permit']) # Implement the subtraction and NaNs
nics_ky_df.totals.fillna(original_totals, inplace=True) # Replace the NaNs with the original data
nics_ky_df.head(20)
<-- Scroll left to view full table -->
# Check the data frame
display(nics_ky_df.iloc[7052:7060, [0,1,2,3,4,5,6,7,26]])
display(nics_ky_df.iloc[7103:7110, [0,1,2,3,4,5,6,7,26]])
display(nics_ky_df.iloc[7152:7160, [0,1,2,3,4,5,6,7,26]])
Examine the data types and counts.
nics_ky_df.info()
With the State of Kentucky data corrected, redo the grouping analysis and graphing.
# NICS data, Kentucky fixed: group by year, then by state
group_by_year_state_df = nics_ky_df.groupby([(nics_ky_df['month']).dt.year, 'state'])
list(group_by_year_state_df)[208:209] # Use list() to display the groups
# Check the counts
group_by_year_state_df.count().iloc[202:210, [0,1,2,3,4,5,6,25]]
## Create a new DataFrame for divided NICS data, Kentucky corrected
std_nics_df = standardize(group_by_year_state_df)
display(std_nics_df[1984:2000])
display (std_nics_df[4588:4604])
Review the summary statistics.
std_nics_df.describe()
The numbers in the Total column are looking much more reasonable. The maximum value has dropped dramatically.
# MatPlotLib
ky_label = False
y_end = 6000
extra = ''
graphit(std_nics_df, ky_label, y_end, extra) # Plot the graph
The State of Kentucky is corrected and is no longer conspicuous. There are a few spikes in the data that are investigated next.
During 2010 and 2011, Utah (purple line) relaxed its state laws regarding gun ownership, concealed carry, and types of firearms allowed. People went out and bought guns.
In June 2014, North Carolina (red line) repealed its firearm registration law.
At the beginning of 2016, Indiana (second purple line) eased its gun carry law.
A distinct cyclical, seasonal trend is observed. This is Black Friday, the day after Thanksgiving when all the retailers offer large discounts.
When state gun laws change, sales of firearms are affected. People purchase before restrictions are implemented, or after laws are loosened.
Mass shootings and terrorist attacks also affect firearm sales.
There is a very slight overall upward trend indicative of market growth (at a low rate).
Run data sorts to analyze how many guns are sold per capita in the different states over the full time period.
# For all the months, sort by Total
std_nics_df.loc[:, ['Month', 'State', 'Total']].sort_values('Total', ascending=False,
inplace=False, kind='quicksort',
na_position='last')[0:20]
Utah sold the most guns per capita in a month when its gun laws changed to ease restrictions.
# For all the months, sort by Long Gun
std_nics_df.loc[:, ['Month', 'State', 'Long Gun']].sort_values('Long Gun', ascending=False,
inplace=False, kind='quicksort',
na_position='last')[0:10]
The most rifles per capita sold in a month were in the Dakotas and Montana.
# For all the months, sort by Handgun
std_nics_df.loc[:, ['Month', 'State', 'Handgun']].sort_values('Handgun', ascending=False, inplace=False,
kind='quicksort', na_position='last')[0:10]
The most handguns per capita sold in a month were in Alabama.
The most new guns per capita sold in a given month were in Utah after the state firearm laws were relaxed. A total of 5572 firearms per capita 100K were sold at the end of year 2010. That equates to 0.06 new guns sold per person, or one new gun per 18 people.
Sales of rifles (long guns) are higher than handguns and the most sold were in South Dakota and Alabama respectively, during Black Friday 2012 and 2015.
| Firearm Type | Number of People per One New Gun |
|---|---|
| Total | 18 |
| Long Gun | 93 |
| Handgun | 102 |
# Perform a sort to find the top 3 and bottom 3 states for each month
# Sorted by Total
# std_nics_df = pd.DataFrame(index = pd.Series(range(0,12)), columns = ['Month', 'State', 'Handgun', 'Long Gun', 'Total'])
std_nics_month_df = std_nics_df.groupby('Month')
# list(std_nics_month_df) # Use list() to display the groups
def sort_by_month(to_sort):
f_result_df = pd.DataFrame(index = range(0,50), columns = [])
the_series = pd.DataFrame(columns = [])
f_top_3 = pd.Series([])
f_bottom_3 = pd.Series([])
sorted = pd.DataFrame
for name, group in std_nics_month_df: # Group by month
sorted = group.loc[:, ['State', to_sort]].sort_values(to_sort, ascending=False, inplace=False, kind='quicksort',
na_position='last')[0:50] # For each month sort States by Total
the_series = pd.DataFrame(sorted['State'])
the_series.index = range(0,50)
the_series.columns = [name.strftime("%Y-%m")]
f_result_df = f_result_df.merge(the_series, left_index=True, right_index=True, how='outer') # Build the Data Frame
# pandas range start and stop ARE included
f_top_3 = f_top_3.append(f_result_df[name.strftime("%Y-%m")][0:2], ignore_index=True)
f_bottom_3 = f_bottom_3.append(f_result_df[name.strftime("%Y-%m")][48:50], ignore_index=True)
f_top_3 = f_top_3.unique()
f_bottom_3 = f_bottom_3.unique()
return (f_result_df, f_top_3, f_bottom_3) # return tuple
t = sort_by_month('Total')
result_df = t[0]
top_3 = t[1]
bottom_3 = t[2]
pd.set_option('display.max_columns', 500)
print (top_3.tolist())
result_df.loc[0:2, :]
<-- Scroll left to view full table -->
# Bottom 3 states by Total
print (bottom_3.tolist()) # Sorted by 'Total'
result_df.loc[47:50, :]
<-- Scroll left to view full table -->
Count the number of months each state ranks in the top 3 or bottom 3.
# Perform a count for the top 3 states
# How many months does the state rank in the top 3
# Sorted by Total
from collections import Counter
z = result_df[0:2].values.flatten().tolist()
top_3_total = Counter(z).most_common()
top_3_total
# Perform a count for the bottom 3 states
# How many months does the state rank in the bottom 3
# Sorted by Total
from collections import Counter
z = result_df[47:50].values.flatten().tolist()
bottom_3_total = Counter(z).most_common()
bottom_3_total
# Perform a sort to find the top 3 and bottom 3 states for each month
# Sorted by Long Gun
t = sort_by_month('Long Gun')
result_df = t[0]
top_3 = t[1]
bottom_3 = t[2]
pd.set_option('display.max_columns', 500)
print (top_3.tolist())
result_df.loc[0:2, :]
<-- Scroll left to view full table -->
# Bottom 3 states by Long Gun
print (bottom_3.tolist()) # Sorted by Long Gun
result_df.loc[47:50, :]
<-- Scroll left to view full table -->
# Perform a count for the top 3 states
# How many months does the state rank in the top 3
# Sorted by Long Gun
from collections import Counter
z = result_df[0:2].values.flatten().tolist()
top_3_long_gun = Counter(z).most_common()
top_3_long_gun
# Perform a count for the bottom 3 states
# How many months does the state rank in the bottom 3
# Sorted by Long Gun
from collections import Counter
z = result_df[47:50].values.flatten().tolist()
bottom_3_long_gun = Counter(z).most_common()
bottom_3_long_gun
# Perform a sort to find the top 3 states for each month
# Sorted by Handgun
t = sort_by_month('Handgun')
result_df = t[0]
top_3 = t[1]
bottom_3 = t[2]
pd.set_option('display.max_columns', 500)
print (top_3.tolist())
result_df.loc[0:2, :]
<-- Scroll left to view full table -->
# Bottom 3 states by Handgun
print (bottom_3.tolist()) # Sorted by Handgun
result_df.loc[47:50, :]
<-- Scroll left to view full table -->
# Perform a count for the top 3 states
# How many months does the state rank in the top 3
# Sorted by Handgun
from collections import Counter
z = result_df[0:2].values.flatten().tolist()
top_3_handgun = Counter(z).most_common()
top_3_handgun
# Perform a count for the bottom 3 states
# How many months does the state rank in the bottom 3
# Sorted by Handgun
from collections import Counter
z = result_df[47:50].values.flatten().tolist()
bottom_3_handgun = Counter(z).most_common()
bottom_3_handgun
View the data in a time series plot of the standardized total firearm checks per state.
# Graphical plots of Top 7 states by Total
top_set = [ ]
for element in top_3_total:
top_set.append(element[0])
print ('The Top 7 States by Total are:')
print (top_set[0:7])
print ()
df_top = pd.DataFrame
df_top = std_nics_df[std_nics_df['State'].isin(top_set[0:7])] # Selects all the TRUE rows --> returns a DataFrame
print (df_top.describe())
# MatPlotLib
ky_label = False
y_end = 6000
extra = '\nTop 7 States'
graphit(df_top, ky_label, y_end, extra) # Plot the graph
The Top 7 States mimic the graph of all the states.
Notable, are the strong spikes in the State of Utah data (purple line). During 2010 and 2011, Utah relaxed its state laws regarding gun ownership, concealed carry, and types of firearms allowed. When state gun laws change, sales of firearms are affected. People purchase before restrictions are implemented, or after laws are loosened.
At the beginning of 2016, Indiana (green line) eased its gun carry law.
All Top 7 States spike in January 2013: The Assault Weapons Ban of 2013 was a bill introduced into Congress by Senator Dianne Feinstein in January 2013, one month after the Sandy Hook Elementary School shooting. It was defeated in the US Senate in April 2013.
A clear cyclical, seasonal trend is observed. This is Black Friday, the day after Thanksgiving when all the retailers offer large discounts.
There is a slight overall upward trend indicative of market growth (at a low rate).
# Graphical plots of Bottom 6 states by Total
bottom_set = [ ]
for element in bottom_3_total:
bottom_set.append(element[0])
print ('The Bottom 6 States by Total are:')
print (bottom_set[0:6])
print ()
df_bottom = pd.DataFrame
df_bottom = std_nics_df[std_nics_df['State'].isin(bottom_set[0:6])] # Selects all the TRUE rows --> returns a DataFrame
print (df_bottom.describe())
# MatPlotLib
ky_label = False
y_end = 860
extra = '\nBottom 6 States'
graphit(df_bottom, ky_label, y_end, extra) # Plot the graph
There are two spikes in the data, the State of Maryland (orange line), and the Assault Weapons Bill of January 2013, as explained below.
State of Maryland: In May 2013, Maryland State Law imposed significant new restrictions on gun ownership. People purchased guns before the new laws went into effect.
All Bottom 6 States spike in January 2013: The Assault Weapons Ban of 2013 was a bill introduced into Congress in January 2013, one month after the Sandy Hook Elementary School shooting. It was defeated in the US Senate in April 2013.
In the states that sell fewer guns per capita (bottom states), the cyclical, seasonal trend is muted.
Plot bar charts to visually examine the results.
import matplotlib.pyplot as plt
top_total_names = [ ]
top_total_values = [ ]
top_long_gun_names = [ ]
top_long_gun_values = [ ]
top_handgun_names = [ ]
top_handgun_values = [ ]
def top_gun(top_tuple ,names, vals):
for element in top_tuple:
names.append(element[0])
vals.append(element[1])
return
top_gun(top_3_total[0:3], top_total_names, top_total_values)
top_gun(top_3_long_gun[0:3], top_long_gun_names, top_long_gun_values)
top_gun(top_3_handgun[0:3], top_handgun_names, top_handgun_values)
ind = [0,1,2] # the x locations for the groups
width = 0.4 # the width of the bars
fig, (ax1, ax2, ax3) = plt.subplots(nrows=1, ncols=3, figsize=(12, 4), sharex=False, sharey=False)
rects1 = ax1.bar(ind, top_total_values, width, align='center')
rects2 = ax2.bar(ind, top_long_gun_values, width, align='center')
rects3 = ax3.bar(ind, top_handgun_values, width, align='center')
def bar_color(bars):
bars[0].set_color('mediumseagreen')
bars[1].set_color('deepskyblue')
bars[2].set_color('burlywood')
return
bar_color(rects1)
bar_color(rects2)
bar_color(rects3)
def chart_axis(axit, yit, ti, lab): # add text for labels, title and axes ticks
axit.set_ylabel(yit)
axit.set_title(ti)
axit.set_xticks(ind)
axit.set_xticklabels(lab)
return
chart_axis(ax1, 'Total Firearm Checks per Capita 100K\nNo. of Months State in Top 3', 'Top 3 States by Total', top_total_names)
chart_axis(ax2, 'Long Gun Checks per Capita 100K\nNo. of Months State in Top 3', 'Top 3 States by Long Guns', top_long_gun_names)
chart_axis(ax3, 'Handgun Checks per Capita 100K\nNo. of Months State in Top 3', 'Top 3 States by Handguns', top_handgun_names)
print ()
fig.tight_layout()
plt.show()
print ()
Notice the vertical axis is the number of months (frequency) the state was in the top or bottom three states.
The States of Montana, West Virginia, and Alaska stand out in the top 3.
bottom_total_names = [ ]
bottom_total_values = [ ]
bottom_long_gun_names = [ ]
bottom_long_gun_values = [ ]
bottom_handgun_names = [ ]
bottom_handgun_values = [ ]
def bottom_gun(bottom_tuple ,names, vals):
for element in bottom_tuple:
names.append(element[0])
vals.append(element[1])
return
bottom_gun(bottom_3_total[0:3], bottom_total_names, bottom_total_values)
bottom_gun(bottom_3_long_gun[0:3], bottom_long_gun_names, bottom_long_gun_values)
bottom_gun(bottom_3_handgun[0:3], bottom_handgun_names, bottom_handgun_values)
ind = [0,1,2] # the x locations for the groups
width = 0.4 # the width of the bars
fig, (ax1, ax2, ax3) = plt.subplots(nrows=1, ncols=3, figsize=(14, 4), sharex=False, sharey=False)
rects1 = ax1.bar(ind, bottom_total_values, width, align='center')
rects2 = ax2.bar(ind, bottom_long_gun_values, width, align='center')
rects3 = ax3.bar(ind, bottom_handgun_values, width, align='center')
def bar_color(bars):
bars[0].set_color('palevioletred')
bars[1].set_color('lightgray')
bars[2].set_color('lightsalmon')
return
bar_color(rects1)
bar_color(rects2)
bar_color(rects3)
def chart_axis(axit, yit, ti, lab): # add text for labels, title and axes ticks
axit.set_ylabel(yit)
axit.set_title(ti)
axit.set_xticks(ind)
axit.set_xticklabels(lab)
return
chart_axis(ax1, 'Total Firearm Checks per Capita 100K\nNo. of Months State in Bottom 3', 'Bottom 3 States by Total', bottom_total_names)
chart_axis(ax2, 'Long Gun Checks per Capita 100K\nNo. of Months State in Bottom 3', 'Bottom 3 States by Long Guns', bottom_long_gun_names)
chart_axis(ax3, 'Handgun Checks per Capita 100K\nNo. of Months State in Bottom 3', 'Bottom 3 States by Handguns', bottom_handgun_names)
print ()
fig.tight_layout()
plt.show()
print ()
The State of New Jersey and District of Columbia (Washington, DC) stand out in the bottom 3.
# Filter out the data for Montana -- the top gun state
std_nics_df.query('State == "Montana" ').describe()
# Filter out the data for New Jersey -- the bottom gun state
std_nics_df.query('State == "New Jersey" ').describe()
This study examines firearm checks as a proxy for gun sales. By focusing on gun purchases, it does not account for existing firearms people may own.
Most new guns per capita are sold in Montana, West Virginia, and Alaska. The least new guns per capita are sold in New Jersey and the District of Columbia.
Rifles are more popular than handguns.
Black Friday is the day when most guns are sold.

| State | Number of People per One New Gun |
|---|---|
| Top State -- Montana | 117 |
| Bottom State -- New Jersey | 1,724 |
There is an order of magnitude difference in new gun sales between the top and bottom states.
The study uses firearm checks as a proxy for new gun sales. Second hand gun sales are not considered. The study does not consider firearms presently in possession. A future research project should attempt to estimate the number of firearms currently in possession by the public.
This section added to demonstrate the coding required to input the data file provided which contains data in several different formats.
# Read in the provided Census data
# Use df.apply() (On a Series, either column or row of df)
# In this situation use df.applymap() instead to apply to each element in df
def fix_dollar_percent(element): # apply to each element in df
if '$' in element:
return element.strip('$') # characters to be removed from beginning or end of the string
elif '%' in element:
return float(element.strip('%'))/100.0
else:
return element
skip = list(range(66, 91))
filename = '/Users/jeffkalmanson/Documents/Udacity_Data_Analyst/P2_Ver_2_Data_Analysis/project_files/U.S. Census Data.csv'
us_census_df = pd.read_csv(filename, thousands = ',', skiprows = skip) # Remove thousands seperator
us_census_df.drop(columns='Fact Note', inplace=True)
# display (us_census_df)
# df.apply() passes a Series object, either column-wise or row-wise, apply function to entire Series e.g. sum()
# or each element in Series
us_census_df = us_census_df.applymap(fix_dollar_percent) # df.applymap() is df.apply() function to each element in df
us_census_df
<-- Scroll left and down to view full table -->